-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate the unstable ptr_to_from_bits
feature
#95583
Conversation
The strict provenance APIs are a better version of these, and things like `.addr()` work for the "that cast looks sketchy" case even if the full strict provenance stuff never happens. So I think it's fine to move away from these, and encourage the others instead.
(rust-highfive has picked a reviewer for you, use r? to override) |
Made my thing use |
My use case linked above concerns a "WASM in the browser" context. It essentially consists of handing ownership of a value completely to the browser host, by attaching it to a custom element object. At a high level the steps are something like this:
(I don't use this process only for custom elements, I use essentially this same concept for "custom events" as well.) In my understanding of "provenance", I think that in this case the browser becomes part of the provenance chain of custody. Our "box" can continue to live owned solely by the host with no remaining references from the Rust-WASM program (e.g. the custom element that owns our box is attached to the DOM). The user may reobtain a reference to the custom element object at a later point (e.g. via a Rust wrapper around Javascript's I don't think the browser currently can, or will ever be able to, give the required provenance guarantees. My current understanding of "strict provenance" is that if it we're to become enforced by the compiler, this would mean that this use-case becomes impossible (because it goes beyond just removing |
Also note #95588 -- the intent (EDIT: or at least, my intent) with |
@RSSchermer I believe this can be as simple as "you should pass Rust pointers to FFI" (and let the other side do the serialization/deserialization by accessing arbitrary parts of wasm linear memory). Once you leave the wasm VM, the "host" side of that VM (JS code in the browser) has the equivalent access of a CHERI kernel (which can synthesize any possible userspace capability - this is needed for swapping RAM to disk then bringing it back into RAM later) or a very permissive DDC (default data capability). In fact, I believe that on CHERI, a wasm VM would explicitly have a single pointer (capability) that covers the entire wasm linear memory (effectively a Rust Provenance semantics actually go away a bit sooner, as wasm itself allows any code access to any part of the linear memory with no UB (lack of UB being part of the design of wasm), but only something like
AFAIK what's up for debate is "integer-pointer casting", whereas transmuting between pointers and integers has been on much shakier ground already - I think that's because even if ptr2int2ptr is supported, it's helpful to have that explicit "escape to integer" step to opt into it, whereas |
@MiSawa I believe you need to be careful in the physical address to pointer mapping.
It's tricky to fully understand the scope of this, but IMO if something is supposed to be a "physical address" that needs to be adjusted for a virtual window, then it shouldn't even be read from memory as a pointer. If it is a pointer in the same address space, then the trait AcpiHandler {...}
impl<H: AcpiHandler> AcpiTables<H> {
pub unsafe fn from_rsdp(
handler: H,
rsdp_address: usize
) -> Result<AcpiTables<H>, AcpiError> {...}
} needs to become: trait AcpiHandler {
type PhysicalAddr: Copy;
...
}
impl<H: AcpiHandler> AcpiTables<H> {
pub unsafe fn from_rsdp(
handler: H,
rsdp_address: H::PhysicalAddr
) -> Result<AcpiTables<H>, AcpiError> {...}
} |
@RSSchermer I can't quite imagine what that all looks like from a Rust perspective, but if (as @eddyb indicated) you pass opaque data as an array of If it must be an integer then indeed you are outside the realm of strict provenance. As long as you explicitly cast (and do not transmute) between pointers and integers, the intent is that this is allowed, but it might not work with things like CHERI or Miri. |
@eddyb If I'm understanding you correctly, then - using my case as an example - if you wanted to do this serialization/deserializaiton with minimal context switches, it might look something like this: struct CustomElementData {
address_ptr: *const (),
vtable_ptr: *const (),
type_id: u64
}
#[wasm_bindgen]
fn serialize_in_js(linear_memory_offset: *mut CustomElementData, size: u32) {}
#[wasm_bindgen]
fn deserialize_in_js(linear_memory_offset: *mut CustomElementData) {}
let some_custom_element_data = Box::new(CustomElementData { ... });
let linear_memory_offset = some_custom_element_data.into_raw();
let size = mem::size_of::<CustomElementData>();
serialize_in_js(linear_memory_offset, size as u32);
unsafe {
mem::drop(Box::from_raw(linear_memory_offset));
}
// Then later...
let uninit_custom_element_data = Box::new(CustomElementData {...});
let linear_memory_offset = uninit_custom_element_data.into_raw();
deserialize_in_js(linear_memory_offset);
// `uninit_custom_element_data` is now initialized with the deserialized data!
let init_custom_element_data = unsafe { Box::from_raw(linear_memory_offset) }; And in JS: function serialize_in_js(linear_memory_offset, size) {
// Create a view the relevant region of the WASM linear memory buffer. I'm not currently sure how
// one would obtain the reference to the correct WASM linear memory buffer.
let view = new Uint8Array(wasm_linear_memory, linear_memory_offset, size);
// Copy it to a new non-view Uint8Array
let serialized = new Uint8Array(view);
// Store it somewhere...
}
function deserialize_in_js(linear_memory_offset) {
// ... retrieve the serialized buffer from wherever it was stored
let serialized = ...;
// Overwrite the target region in WASM linear memory buffer with the serialized buffer
wasm_linear_memory.set(serialized, linear_memory_offset);
} I think something like that might work. I'm still left with the question of how to obtain the reference to the relevant linear memory buffer. Is there/could there be a special Alternatively, if I understand you correctly, if I could somehow obtain the "WASM base pointer" (the zero pointer?) for the current WASM linear memory in Rust, then I could deserialize "on the Rust side" using that base pointer as provenance? // Lets assume there is some constant that represents this "WASM linear memory base pointer"
let my_address_ptr: *const () = WASM_BASE_POINTER.with_addr(my_address_ptr_bits); At least it seems the provenance situation in WASM is much simpler than I initially feared in my limited understanding of the concept. My case would be addressed if either of these solutions could be made to work (or perhaps there is still a better solution altogether?). As to removing |
Yeah, that looks about right.
It's part of the "wasm VM state". In the browser, according to the docs on MDN, it seems that you can only access the Looking around I did find that |
@eddyb Ah hmm, yeah it's weird. It uses |
Note that provenance is a Rust concept. x86 doesn't have provenance any more than WASM does. So I don't think the fact that you are on WASM changes anything here. vtable metadata makes things more tricky so I'll just consider pointer-sized arbitrary data being handed by the 'outside world' for now. If you have an FFI like extern {
fn store_data(..., data: usize);
fn register_callback(..., callback: fn(usize)); // gets called with the `data`
} then you cannot use strict provenance since you are forced to turn the data into an integer before giving it away. But you can still do register_callback(..., |addr| {
let ptr = addr as *mut T;
...
}); because round-tripping a pointer through an integer still works under 'permissive provenance'. You just need to use a ptr2int operation that explicitly tells the compiler that you will later cast this integer back to a pointer and not give the language any help re: the provenance of that new pointer. The intent is to add Of course, if you can actually adjust the FFI (since it seems to be partially under your control) to actually use pointer types on the FFI boundary, that's even better as then provenance will be preserved. :) |
The issue is, it is not possible to serialize a pointer into an integer without losing information. Specifically, the provenance gets lost. So there are two options
|
…n the Rust side Should avoid potential provenance issues, see rust-lang/rust#95583
wasm_bindgen::memory indeed turned out to be exactly what was needed, thank you for making me aware of its existance @eddyb. It returns a WebAssembly.Memory object that contains the linear memory buffer for the calling WASM code. For reference, here's a modified version of the outline above that better reflects what I ended up doing: struct CustomElementData {
address: *mut (),
metadata: DynMetadata<dyn Any>,
type_id: TypeId
}
#[wasm_bindgen]
fn serialize_custom_element_data(
wasm_memory: &JsValue,
offset: *mut CustomElementData,
size: u32
) -> js_sys::Uint8Array {}
#[wasm_bindgen]
fn deserialize_custom_element_data(
wasm_memory: &JsValue,
offset: *mut CustomElementData,
serialized: &js_sys::Uint8Array
) {}
let mut custom_element_data = CustomElementData { ... };
let offset = &mut custom_element_data as *mut CustomElementData;
let size = mem::size_of::<CustomElementData>();
let serialized = serialize_custom_element_data(&wasm_bindgen::memory(), offset, size as u32);
// Explicitly drop here so it doesn't drop early...
mem::drop(custom_element_data);
// Then later...
let uninit_custom_element_data = MaybeUninit::<CustomElementData>::uninit();
let offset = uninit_custom_element_data.as_mut_ptr();
deserialize_custom_element_data(&wasm_bindgen::memory(), offset, &serialized);
// `uninit_custom_element_data` is now initialized with the deserialized data!
let init_custom_element_data = unsafe { uninit_custom_element_data.assume_init() }; And in JS: function serialize_custom_element_data(
wasm_memory,
offset,
size
) {
// Create a view the relevant region of the WASM linear memory buffer.
let view = new Uint8Array(wasm_memory.buffer, offset, size);
// Copy it to a new non-view Uint8Array and return
return new Uint8Array(view);
}
function deserialize_custom_element_data(
wasm_memory,
offset,
serialized
) {
// View WASM memory as bytes
let buffer_view = new Uint8Array(wasm_memory.buffer);
// Overwrite the memory at the pointer offset with the `serialized` data
buffer_view.set(serialized, offset);
} (Link to the actual commit in case someone really wants the nitty gritty.) Note that earlier I had assumed you'd have to box the data that you want to "serialize", because I seemed to recall that stack memory was not part of the linear memory buffer, but my impressions seems either wrong or I'm misunderstanding something about what's really going on under the hood here, because empirically it does work without boxing. Also note that my NINJA EDIT: I came accross this post, which gives a high level explanation as to why this works without boxing, even though the actual stack memory is indeed separate (in short: a "shadow stack" is maintained in linear memory); figured I'd link it here for completeness. |
Changing to those doesn't introduce any new unsoundness over the existing ones, so they're the better "if you won't want to think about it" replacement. But also mention the strict provenance APIs, as that's what we'd rather they use instead.
I've updated the deprecation message to point at the new That way the "I don't want to think about it" replacement is sound, but hopefully the mention of the strict provenance APIs (and the good docs on EDIT: #95588 landed, so this is unblocked now. |
Heh, glad it's working, and that you found an explanation. The stack in linear memory (used for variables that have their address taken, i.e. that could stay hidden in CPU registers when compiling natively) is the kind of thing you'd call "the (data) stack", as it's accessible with normal pointers, and "shadow stack" would typically refer to managed/hidden stacks, like:
(One of the places where I've seen "shadow" used is the AddressSanitizer "shadow memory", used to hold metadata about allocations) |
The code in question is generally well beyond my understanding, but yes my best guess is that the referenced "memory" does "come from the outside" (whether there is even memory at that address is not clear to me). As far as I can tell, the address comes from the magic system-level entrypoint related to UEFI. Effectively, an argument to Whether there are also references in play I have no idea. Miri would know, but I think getting Miri to execute programs like this won't happen for a while yet. |
@RalfJung Good point. I've updated the OP to not (essentially) say "use strict provenance instead" like it did before. And the deprecation warnings on the methods point to the @joshtriplett I think this PR is good to be reviewed now that we've had the meeting. |
@rust-lang/libs-api this PR has been waiting for review for a while, would be nice to see it move ahead. :) |
☔ The latest upstream changes (presumably #104655) made this pull request unmergeable. Please resolve the merge conflicts. |
3cfface
to
4d37d1f
Compare
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
I pushed a rebase to resolve conflict with #104632. |
@bors r+ |
@bors r+ |
Thanks, @dtolnay ! |
…earth Rollup of 7 pull requests Successful merges: - rust-lang#83608 (Add slice methods for indexing via an array of indices.) - rust-lang#95583 (Deprecate the unstable `ptr_to_from_bits` feature) - rust-lang#101655 (Make the Box one-liner more descriptive) - rust-lang#102207 (Constify remaining `Layout` methods) - rust-lang#103193 (mark sys_common::once::generic::Once::new const-stable) - rust-lang#104622 (Use clang for the UEFI targets) - rust-lang#104638 (Move macro_rules diagnostics to diagnostics module) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
…cottmcm Remove (deprecated & unstable) {to,from}_bits pointer methods These unstable methods have been deprecated for more than a year (since rust-lang#95583). Remove them. See rust-lang#91126 (comment) and https://github.com/rust-lang/rust/pull/110441/files#r1169574509. Closes rust-lang#91126. r? `@scottmcm`
Rollup merge of rust-lang#127071 - Sky9x:remove-ptr-to-from-bits, r=scottmcm Remove (deprecated & unstable) {to,from}_bits pointer methods These unstable methods have been deprecated for more than a year (since rust-lang#95583). Remove them. See rust-lang#91126 (comment) and https://github.com/rust-lang/rust/pull/110441/files#r1169574509. Closes rust-lang#91126. r? `@scottmcm`
I propose that we deprecate the (unstable!)
to_bits
andfrom_bits
methods on raw pointers. (With the intent toremove them onceremove them fairly soon after, as the strict and expose versions have been around for a while already.)addr
has been around long enough to make the transition easy on people -- maybe another 6 weeksThe APIs that came from the strict provenance explorations (#95228) are a more holistic version of these, and things like
.expose_addr()
work for the "that cast looks sketchy" case even if the full strict provenance stuff never happens. (As a bonus,addr
is even shorter thanto_bits
, though it is only applicable if people can use full strict provenance!addr
is not a direct replacement forto_bits
.) So I think it's fine to move away from the{to|from}_bits
methods, and encourage the others instead.That also resolves the worry that was brought up (I forget where) that
q.to_bits()
and(*q).to_bits()
both work ifq
is a pointer-to-floating-point, as they also have ato_bits
method.Tracking issue #91126
Code search: https://github.com/search?l=Rust&p=1&q=ptr_to_from_bits&type=Code
For potential pushback, some users in case they want to chime in